Fast and Accurate Ground Truth Generation for Skew-Tolerance Evaluation of Page Segmentation Algorithms

نویسندگان

  • Oleg Okun
  • Matti Pietikäinen
چکیده

Many image segmentation algorithms are known, but often there is an inherent obstacle in the unbiased evaluation of segmentation quality: the absence or lack of a common objective representation for segmentation results. Such a representation, known as the ground truth, is a description of what one should obtain as the result of ideal segmentation, independently of the segmentation algorithm used. The creation of ground truth is a laborious process and therefore any degree of automation is always welcome. Document image analysis is one of the areas where ground truths are employed. In this paper, we describe an automated tool called GROTTO intended to generate ground truths for skewed document images, which can be used for the performance evaluation of page segmentation algorithms. Some of these algorithms are claimed to be insensitive to skew (tilt of text lines). However, this fact is usually supported only by a visual comparison of what one obtains and what one should obtain since ground truths are mostly available for upright images, that is, those without skew. As a result, the evaluation is both subjective; that is, prone to errors, and tedious. Our tool allows users to quickly and easily produce many sufficiently accurate ground truths that can be employed in practice and therefore it facilitates automatic performance evaluation. The main idea is to utilize the ground truths available for upright images and the concept of the representative square [9] in order to produce the ground truths for skewed images. The usefulness of our tool is demonstrated through a number of experiments with real-document images of complex layout.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Benchmarking page segmentation algorithms

A method for automatically evaluating the quality of document page segmentation algorithms is introduced. Many different zoning techniques are now available, but there exists no robust method to benchmark and evaluate them reliably. Our proposed strategy is a region-based approach, in which segmentation results are compared with manually generated "ground truth files", describing all possible c...

متن کامل

Methodology for Flexible and Efficient Analysis of the Performance of Page Segmentation Algorithms

This paper presents part of a new DIA performance analysis framework aimed at Layout Analysis algorithm developers. A new region-representation scheme (an interval-based description of isothetic polygons) and a corresponding comparison approach are introduced. These enable fast and accurate geometric comparison of ground-truth with results of page segmentation, improving on current evaluation m...

متن کامل

Automatic Ground-Truth Generation for Skew-Tolerance Evaluation of Document Layout Analysis Methods

Generation of ground-truths is of great importance for unbiased performance evaluation of document layout analysis methods. This is especially necessary because many methods are claimed to be skew-tolerant. However, experimental evaluation of this fact is often based only on human subjective judgement and restricted to a few experiments. The main obstacle for obtaining human-independent and mor...

متن کامل

A Proposed Scheme for Performance Evaluation of Graphics/Text Separation Algorithms

We propose an objective, comprehensive, and complexity independent metric for performance evaluation of graphics/text separation (text segmentation) algorithms. The metric includes a positive set and a negative set of indices, at both the character and the character string (text) levels, and it evaluates the detection accuracy of the location, width, height, orientation, skew, string length, an...

متن کامل

Evaluating SEE - A Benchmarking System for Document Page Segmentation

The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and improvement of page segmentation algorithms is systematic evaluation. The approaches known from the literature have the disadvantage that manually generated reference data (zoning ground truth) are needed for the evaluation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Adv. Sig. Proc.

دوره 2006  شماره 

صفحات  -

تاریخ انتشار 2006